Statistical Parsing with a Grammar Acquired from a Bracketed Corpus Based on Clustering Analysis
ثبت نشده
چکیده
This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on their local contextual information, the corpus is automatically labeled with some nonterminal labels, and consequently a grammar with conditional probabilities is acquired. The statistical parsing model provides a framework for nding the most likely parse of a sentence based on these conditional probabilities. Experiments using Wall Street Journal data show that our approach achieves a relatively high accuracy: 88 % recall, 72 % precision and 0.7 crossing brackets per sentence for sentences shorter than 10 words, and 71 % recall, 51 % precision and 3.4 crossing brackets for sentences between 1019 words. This result supports the assumption that local contextual statistics obtained from an unlabeled bracketed corpus are e ective for learning a useful grammar and parsing.
منابع مشابه
Grammar Acquisition Based on Clustering Analysis and Its Application to Statistical Parsing
This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...
متن کاملKEY WORDS-Statistical Parsing, Grammar Acquisition, Clustering Analysis, Local Contextual
This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...
متن کاملStatistical Parsing with a Grammar Acquired from a Bracketed Corpus Based on Clustering Analysis
This work proposes a new method for learning a contextsensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis, and introduces a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. The method is superior to previous works (i.e., [ Collins, 1996 ] ) in the followi...
متن کاملGrammar Acquisition and Statistical Parsing by exploiting Local Contextual Information
This paper presents a method for inducing a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus using local contextual information and describes a natural language parsing model which uses a probabilitybased scoring function of the grammar to rank parses of a sentence. This method uses clustering techniques to group brackets in a corpus into a numbe...
متن کاملTowards Automatic Grammar Acquisition from a Bracketed Corpus
1 I n t r o d u c t i o n Designing and refining a natural language grammar is a diiBcult and time-consuming task and requires a large amount of skilled effort. A hand-crafted grammar is usually not completely satisfactory and frequently fails to cover many unseen sentences. Automatic acquisition of grammars is a solution to this problem. Recently, with the increasing availability of large, mac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997